Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase

ABSTRACT

The present invention relates to the field of DNA recombinant technology. More specifically, this invention relates to fusion proteins comprising an ATP generating polypeptide joined to a polypeptide that converts ATP into a detectable entity. Accordingly, this invention focuses on sulfurylase-luciferase fusion proteins. This invention also relates to pharmaceutical compositions containing the fusion proteins and methods for using them.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No. 10/154,515, filed May 23, 2002, which is a continuation in part of U.S. patent application Ser. No. 10/122,706 filed Apr. 11, 2002 which claims the benefit of priority to U.S. Patent Application 60/335,949 filed Oct. 30, 2001 and U.S. Patent Application 60/349,076 filed Jan. 16, 2002. All patents, patent applications and references cited in this specification is hereby incorporated by reference.

FIELD OF THE INVENTION

The invention relates generally to fusion proteins that are useful as reporter proteins, in particular to fusion proteins of ATP sulfurylase and luciferase which are utilized to achieve an efficient conversion of pyrophosphate (PPi) to light. This invention also relates to a novel thermostable sulfurylase which can be used in the detection of inorganic pyrophosphate, particularly in the sequencing of nucleic acid.

BACKGROUND OF THE INVENTION

ATP sulfurylase has been identified as being involved in sulfur metabolism. It catalyzes the initial reaction in the metabolism of inorganic sulfate (SO₄ ⁻²); see e.g., Robbins and Lipmann, 1958. J. Biol. Chem. 233: 686-690; Hawes and Nicholas, 1973. Biochem. J. 133: 541-550). In this reaction SO₄ ⁻² is activated to adenosine 5′-phosphosulfate (APS). ATP sulfurylase is also commonly used in pyrophosphate sequencing methods. In order to convert pyrophosphate (PPi) generated from the addition of dNMP to a growing DNA chain to light, PPi must first be converted to ATP by ATP sulfurylase.

ATP produced by an ATP sulfurylase can also be hydrolyzed using enzymatic reactions to generate light. Light-emitting chemical reactions (i.e., chemiluminescence) and biological reactions (i.e., bioluminescence) are widely used in analytical biochemistry for sensitive measurements of various metabolites. In bioluminescent reactions, the chemical reaction that leads to the emission of light is enzyme-catalyzed. For example, the luciferin-luciferase system allows for specific assay of ATP. Thus, both ATP generating enzymes, such as ATP sulfurylase, and light emitting enzymes, such as luciferase, could be useful in a number of different assays for the detection and/or concentration of specific substances in fluids and gases. Since high physical and chemical stability is sometimes required for enzymes involved in sequencing reactions, a thermostable enzyme is desirable.

Because the product of the sulfurylase reaction is consumed by luciferase, proximity between these two enzymes by covalently linking the two enzymes in the form of a fusion protein would provide for a more efficient use of the substrate. Substrate channeling is a phenomenon in which substrates are efficiently delivered from enzyme to enzyme without equilibration with other pools of the same substrates. In effect, this creates local pools of metabolites at high concentrations relative to those found in other areas of the cell. Therefore, a fusion of an ATP generating polypeptide and an ATP converting peptide could benefit from the phenomenon of substrate channeling and would reduce production costs and increase the number of enzymatic reactions that occur during a given time period.

All patents and publications cited throughout the specification are hereby incorporated by reference into this specification in their entirety in order to more fully describe the state of the art to which this invention pertains.

SUMMARY OF THE INVENTION

The invention provides a fusion protein comprising an ATP generating polypeptide bound to a polypeptide which converts ATP into an entity which is detectable. In one aspect, the invention provides a fusion protein comprising a sulfurylase polypeptide bound to a luciferase polypeptide. This invention provides a nucleic acid that comprises an open reading frame that encodes a novel thermostable sulfurylase polypeptide. In a further aspect, the invention provides for a fusion protein comprising a thermostable sulfurylase joined to at least one affinity tag.

In another aspect, the invention provides a recombinant polynucleotide that comprises a coding sequence for a fusion protein having a sulfurylase poylpeptide sequence joined to a luciferase polypeptide sequence. In a further aspect, the invention provides an expression vector for expressing a fusion protein. The expression vector comprises a coding sequence for a fusion protein having: (i) a regulatory sequence, (ii) a first polypeptide sequence of an ATP generating polypeptide and (iii) a second polypeptide sequence that converts ATP to an entity which is detectable. In an additional embodiment, the fusion protein comprises a sulfurylase polypeptide and a luciferase polypeptide. In another aspect, the invention provides a transformed host cell which comprises the expression vector. In an additional aspect, the invention provides a fusion protein bound to a mobile support. The invention also includes a kit comprising a sulfurylase-luciferase fusion protein expression vector.

The invention also includes a method for determining the nucleic acid sequence in a template nucleic acid polymer, comprising: (a) introducing the template nucleic acid polymer into a polymerization environment in which the nucleic acid polymer will act as a template polymer for the synthesis of a complementary nucleic acid polymer when nucleotides are added; (b) successively providing to the polymerization environment a series of feedstocks, each feedstock comprising a nucleotide selected from among the nucleotides from which the complementary nucleic acid polymer will be formed, such that if the nucleotide in the feedstock is complementary to the next nucleotide in the template polymer to be sequenced said nucleotide will be incorporated into the complementary polymer and inorganic pyrophosphate will be released; (c) separately recovering each of the feedstocks from the polymerization environment; and (d) measuring the amount of PPi with an ATP generating polypeptide-ATP converting polypeptide fusion protein in each of the recovered feedstocks to determine the identity of each nucleotide in the complementary polymer and thus the sequence of the template polymer. In one embodiment, the amount of inorganic pyrophosphate is measured by the steps of: (a) adding adenosine-5′-phosphosulfate to the feedstock; (b) combining the recovered feedstock containing adenosine-5′-phosphosulfate with an ATP generating polypeptide-ATP converting polypeptide fusion protein such that any inorganic pyrophosphate in the recovered feedstock and the adenosine-5′-phosphosulfate will react to the form ATP and sulfate; (c) combining the ATP and sulfate-containing feedstock with luciferin in the presence of oxygen such that the ATP is consumed to produced AMP, inorganic pyrophosphate, carbon dioxide and light; and (d) measuring the amount of light produced.

In another aspect, the invention includes a method wherein each feedstock comprises adenosine-5′-phosphosulfate and luciferin in addition to the selected nucleotide base, and the amount of inorganic pyrophosphate is determined by reacting the inorganic pyrophosphate feedstock with an ATP generating polypeptide-ATP converting polypeptide fusion protein thereby producing light in an amount proportional to the amount of inorganic pyrophosphate, and measuring the amount of light produced.

In another aspect, the invention provides a method for sequencing a nucleic acid, the method comprising; (a) providing one or more nucleic acid anchor primers; (b) providing a plurality of single-stranded circular nucleic acid templates disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm; (c) annealing an effective amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing an effective amount of a sequencing primer to one or more copies of said covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, if the predetermined nucleotide triphosphate is incorporated onto the 3′ end of said sequencing primer, a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of a ATP generating polypeptide-ATP converting polypeptide fusion protein, thereby determining the sequence of the nucleic acid.

In one aspect, the invention provides a method for sequencing a nucleic acid, the method comprising: (a) providing at least one nucleic acid anchor primer; (b) providing a plurality of single-stranded circular nucleic acid templates in an array having at least 400,000 discrete reaction sites; (c) annealing a first amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing a second amount of a sequencing primer to one or more copies of the covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, when the predetermined nucleotide triphosphate is incorporated onto the 3′ end of the sequencing primer, to yield a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of a ATP generating polypeptide-ATP converting polypeptide fusion protein, thereby determining the sequence of the nucleic acid at each reaction site that contains a nucleic acid template.

In another aspect, the invention includes a method of determining the base sequence of a plurality of nucleotides on an array, the method comprising the steps of: (a) providing a plurality of sample DNAs, each disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, (b) adding an activated nucleotide 5′-triphosphate precursor of one known nitrogenous base to a reaction mixture in each reaction chamber, each reaction mixture comprising a template-directed nucleotide polymerase and a single-stranded polynucleotide template hybridized to a complementary oligonucleotide primer strand at least one nucleotide residue shorter than the templates to form at least one unpaired nucleotide residue in each template at the 3′-end of the primer strand, under reaction conditions which allow incorporation of the activated nucleoside 5′-triphosphate precursor onto the 3′-end of the primer strands, provided the nitrogenous base of the activated nucleoside 5′-triphosphate precursor is complementary to the nitrogenous base of the unpaired nucleotide residue of the templates; (c) determining whether or not the nucleoside 5′-triphosphate precursor was incorporated into the primer strands through detection of a sequencing byproduct with a ATP generating polypeptide-ATP converting polypeptide fusion protein, thus indicating that the unpaired nucleotide residue of the template has a nitrogenous base composition that is complementary to that of the incorporated nucleoside 5′-triphosphate precursor; and (d) sequentially repeating steps (b) and (c), wherein each sequential repetition adds and, detects the incorporation of one type of activated nucleoside 5′-triphosphate precursor of known nitrogenous base composition; and

(e) determining the base sequence of the unpaired nucleotide residues of the template in each reaction chamber from the sequence of incorporation of said nucleoside precursors.

In one aspect, the invention includes a method for determining the nucleic acid sequence in a template nucleic acid polymer, comprising: (a) introducing a plurality of template nucleic acid polymers into a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, each reaction chamber having a polymerization environment in which the nucleic acid polymer will act as a template polymer for the synthesis of a complementary nucleic acid polymer when nucleotides are added; (b) successively providing to the polymerization environment a series of feedstocks, each feedstock comprising a nucleotide selected from among the nucleotides from which the complementary nucleic acid polymer will be formed, such that if the nucleotide in the feedstock is complementary to the next nucleotide in the template polymer to be sequenced said nucleotide will be incorporated into the complementary polymer and inorganic pyrophosphate will be released; (c) detecting the formation of inorganic pyrophosphate with an ATP generating polypeptide-ATP converting polypeptide fusion protein to determine the identify of each nucleotide in the complementary polymer and thus the sequence of the template polymer.

In one aspect, the invention provides a method of identifying the base in a target position in a DNA sequence of sample DNA including the steps comprising: (a) disposing sample DNA within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, said DNA being rendered single stranded either before or after being disposed in the reaction chambers, (b) providing an extension primer which hybridizes to said immobilized single-stranded DNA at a position immediately adjacent to said target position; (c) subjecting said immobilized single-stranded DNA to a polymerase reaction in the presence of a predetermined nucleotide triphosphate, wherein if the predetermined nucleotide triphosphate is incorporated onto the 3′ end of said sequencing primer then a sequencing reaction byproduct is formed; and

(d) identifying the sequencing reaction byproduct with a ATP generating polypeptide-ATP converting polypeptide fusion protein, thereby determining the nucleotide complementary to the base at said target position.

The invention also includes a method of identifying a base at a target position in a sample DNA sequence comprising: (a) providing sample DNA disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, said DNA being rendered single stranded either before or after being disposed in the reaction chambers; (b) providing an extension primer which hybridizes to the sample DNA immediately adjacent to the target position; (c) subjecting the sample DNA sequence and the extension primer to a polymerase reaction in the presence of a nucleotide triphosphate whereby the nucleotide triphosphate will only become incorporated and release pyrophosphate (PPi) if it is complementary to the base in the target position, said nucleotide triphosphate being added either to separate aliquots of sample-primer mixture or successively to the same sample-primer mixture; (d) detecting the release of PPi with an ATP generating polypeptide-ATP converting polypeptide fusion protein to indicate which nucleotide is incorporated.

In one aspect, the invention provides a method of identifying a base at a target position in a single-stranded sample DNA sequence, the method comprising: (a) providing an extension primer which hybridizes to sample DNA immediately adjacent to the target position, said sample DNA disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 um, said DNA being rendered single stranded either before or after being disposed in the reaction chambers; (b) subjecting the sample DNA and extension primer to a polymerase reaction in the presence of a predetermined deoxynucleotide or dideoxynucleotide whereby the deoxynucleotide or dideoxynucleotide will only become incorporated and release pyrophosphate (PPi) if it is complementary to the base in the target position, said predetermined deoxynucleotides or dideoxynucleotides being added either to separate aliquots of sample-primer mixture or successively to the same sample-primer mixture, (c) detecting any release of PPi with an ATP generating polypeptide-ATP converting polypeptide fusion protein to indicate which deoxynucleotide or dideoxynucleotide is incorporated;characterized in that, the PPi-detection enzyme(s) are included in the polymerase reaction step and in that in place of deoxy- or dideoxy adenosine triphosphate (ATP) a dATP or ddATP analogue is used which is capable of acting as a substrate for a polymerase but incapable of acting as a substrate for a said PPi-detection enzyme.

In another aspect, the invention includes a method of determining the base sequence of a plurality of nucleotides on an array, the method comprising: (a) providing a plurality of sample DNAs, each disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, (b) converting PPi into light with an ATP generating polypeptide-ATP converting polypeptide fusion protein; (c) detecting the light level emitted from a plurality of reaction sites on respective portions of an optically sensitive device; (d) converting the light impinging upon each of said portions of said optically sensitive device into an electrical signal which is distinguishable from the signals from all of said other regions; (e) determining a light intensity for each of said discrete regions from the corresponding electrical signal; (f) recording the variations of said electrical signals with time.

In one aspect, the invention provides a method for sequencing a nucleic acid, the method comprising:(a) providing one or more nucleic acid anchor primers; (b) providing a plurality of single-stranded circular nucleic acid templates disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm;(c) converting PPi into a detectable entity with the use of an ATP generating polypeptide-ATP converting polypeptide fusion protein; (d) detecting the light level emitted from a plurality of reaction sites on respective portions of an optically sensitive device; (e) converting the light impinging upon each of said portions of said optically sensitive device into an electrical signal which is distinguishable from the signals from all of said other regions; (f) determining a light intensity for each of said discrete regions from the corresponding electrical signal; (g) recording the variations of said electrical signals with time.

In another aspect, the invention includes a method for sequencing a nucleic acid, the method comprising: (a) providing at least one nucleic acid anchor primer; (b) providing a plurality of single-stranded circular nucleic acid templates in an array having at least 400,000 discrete reaction sites; (c) converting PPi into a detectable entity with an ATP generating polypeptide-ATP converting polypeptide fusion protein; (d) detecting the light level emitted from a plurality of reaction sites on respective portions of an optically sensitive device; (e) converting the light impinging upon each of said portions of said optically sensitive device into an electrical signal which is distinguishable from the signals from all of said other regions; (f) determining a light intensity for each of said discrete regions from the corresponding electrical signal; (g) recording the variations of said electrical signals with time.

In another aspect, the invention includes an isolated polypeptide comprising an amino acid sequence selected from the group consisting of: (a) a mature form of an amino acid sequence of SEQ ID NO: 2; (b) a variant of a mature form of an amino acid sequence of SEQ ID NO: 2; an amino acid sequence of SEQ ID NO: 2; (c) a variant of an amino acid sequence of SEQ ID NO: 2, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 5% of amino acid residues from said amino acid sequence; (d) and at least one conservative amino acid substitution to the amino acid sequences in (a), (b), (c) or (d). The invention also includes an antibody that binds immunospecifically to the polypeptide of (a), (b), (c) or (d).

In another aspect, the invention includes an isolated nucleic acid molecule comprising a nucleic acid sequence encoding a polypeptide comprising an amino acid sequence selected from the group consisting of: (a) a mature form of an amino acid sequence of SEQ ID NO: 2; (b) a variant of a mature form of an amino acid sequence of SEQ ID NO: 2, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 5% of the amino acid residues from the amino acid sequence of said mature form; (c) an amino acid sequence of SEQ ID NO: 2; (d) a variant of an amino acid sequence of SEQ ID NO: 2, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 15% of amino acid residues from said amino acid sequence; a nucleic acid fragment encoding at least a portion of a polypeptide comprising an amino acid sequence of SEQ ID NO: 2, or a variant of said polypeptide, wherein one or more amino acid residues in said variant differs from the amino acid sequence of said mature form, provided that said variant differs in no more than 5% of amino acid residues from said amino acid sequence; (e) and a nucleic acid molecule comprising the complement of (a), (b), (c), (d) or (e).

In a further aspect, the invention provides a nucleic acid molecule wherein the nucleic acid molecule comprises nucleotide sequence selected from the group consisting of: (a) a first nucleotide sequence comprising a coding sequence differing by one or more nucleotide sequences from a coding sequence encoding said amino acid sequence, provided that no more than 20% of the nucleotides in the coding sequence in said first nucleotide sequence differ from said coding sequence; an isolated second polynucleotide that is a complement of the first polynucleotide; (b) and a nucleic acid fragment of (a) or (b). The invention also includes a vector comprising the nucleic acid molecule of (a) or (b). In another aspect, the invention includes a cell comprising the vector.

In a further aspect, the invention includes a method for determining the nucleic acid sequence in a template nucleic acid polymer, comprising: (a) introducing the template nucleic acid polymer into a polymerization environment in which the nucleic acid polymer will act as a template polymer for the synthesis of a complementary nucleic acid polymer when nucleotides are added; (b) successively providing to the polymerization environment a series of feedstocks, each feedstock comprising a nucleotide selected from among the nucleotides from which the complementary nucleic acid polymer will be formed, such that if the nucleotide in the feedstock is complementary to the next nucleotide in the template polymer to be sequenced said nucleotide will be incorporated into the complementary polymer and inorganic pyrophosphate will be released; (c) separately recovering each of the feedstocks from the polymerization environment; and (d) measuring the amount of PPi with an ATP sulfurylase and a luciferase in each of the recovered feedstocks to determine the identity of each nucleotide in the complementary polymer and thus the sequence of the template polymer.

In another aspect, the invention provides a method for sequencing a nucleic acid, the method comprising: (a) providing one or more nucleic acid anchor primers; (b) providing a plurality of single-stranded circular nucleic acid templates disposed within a plurality of cavities in an array on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm and at least 400,000 discrete sites; (c) annealing an effective amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing an effective amount of a sequencing primer to one or more copies of said covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, if the predetermined nucleotide triphosphate is incorporated onto the 3′ end of said sequencing primer, a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of an ATP sulfurylase and a luciferase, thereby determining the sequence of the nucleic acid.

In another aspect, the invention provides a method for sequencing a nucleic acid, the method comprising: (a) providing at least one nucleic acid anchor primer; (b) providing a plurality of single-stranded circular nucleic acid templates in an array having at least 400,000 discrete reaction sites; (c) annealing a first amount of the nucleic acid anchor primer to at least one of the single-stranded circular templates to yield a primed anchor primer-circular template complex; (d) combining the primed anchor primer-circular template complex with a polymerase to form an extended anchor primer covalently linked to multiple copies of a nucleic acid complementary to the circular nucleic acid template; (e) annealing a second amount of a sequencing primer to one or more copies of the covalently linked complementary nucleic acid; (f) extending the sequencing primer with a polymerase and a predetermined nucleotide triphosphate to yield a sequencing product and, when the predetermined nucleotide triphosphate is incorporated onto the 3′ end of the sequencing primer, to yield a sequencing reaction byproduct; and (g) identifying the sequencing reaction byproduct with the use of a thermostable sulfurylase and a luciferase, thereby determining the sequence of the nucleic acid at each reaction site that contains a nucleic acid template.

In a further aspect, the invention includes a method of determining the base sequence of a plurality of nucleotides on an array, the method comprising: (a) providing a plurality of sample DNAs, each disposed within a plurality of cavities on a planar surface, each cavity forming an analyte reaction chamber, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, (b) adding an activated nucleotide 5′-triphosphate precursor of one known nitrogenous base to a reaction mixture in each reaction chamber, each reaction mixture comprising a template-directed nucleotide polymerase and a single-stranded polynucleotide template hybridized to a complementary oligonucleotide primer strand at least one nucleotide residue shorter than the templates to form at least one unpaired nucleotide residue in each template at the 3′-end of the primer strand, under reaction conditions which allow incorporation of the activated nucleoside 5′-triphosphate precursor onto the 3′-end of the primer strands, provided the nitrogenous base of the activated nucleoside 5′-triphosphate precursor is complementary to the nitrogenous base of the unpaired nucleotide residue of the templates; (c) detecting whether or not the nucleoside 5′-triphosphate precursor was incorporated into the primer strands through detection of a sequencing byproduct with a thermostable sulfurylase and luciferase, thus indicating that the unpaired nucleotide residue of the template has a nitrogenous base composition that is complementary to that of the incorporated nucleoside 5′-triphosphate precursor; and (d) sequentially repeating steps (b) and (c), wherein each sequential repetition adds and, detects the incorporation of one type of activated nucleoside 5′-triphosphate precursor of known nitrogenous base composition; and (e) determining the base sequence of the unpaired nucleotide residues of the template in each reaction chamber from the sequence of incorporation of said nucleoside precursors.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is one embodiment for a cloning strategy for obtaining the luciferase-sulfurylase sequence.

FIG. 2A and 2B show the preparative agarose gel of luciferase and sulfurylase as well as sulfurylase-luciferase fusion genes.

FIG. 3 shows the results of experiments to determine the activity of the luciferase-sulfurylase fusion protein on NTA-agarose and MPG-SA solid supports.

DETAILED DESCRIPTION OF THE INVENTION

This invention provides a fusion protein containing an ATP generating polypeptide bound to a polypeptide which converts ATP into an entity which is detectable. As used herein, the term “fusion protein” refers to a chimeric protein containing an exogenous protein fragment joined to another exogenous protein fragment. The fusion protein could include an affinity tag to allow attachment of the protein to a solid support or to allow for purification of the recombinant fusion protein from the host cell or culture supernatant, or both.

In a preferred embodiment, the ATP generating polypeptide and ATP converting polypeptide are from a eukaryote or a prokaryote. The eukaryote could be an animal, plant, fungus or yeast. In some embodiments, the animal is a mammal, rodent, insect, worm, mollusk, reptile, bird and amphibian. Plant sources of the polypeptides include but are not limited to Arabidopsis thaliana, Brassica napus, Allium sativum, Amaranthus caudatus, Hevea brasiliensis, Hordeum vulgare, Lycopersicon esculentum, Nicotiana tabacum, Oryza sativum, Pisum sativum, Populus trichocarpa, Solanum tuberosum, Secale cereale, Sambucus nigra, Ulmus americana or Triticum aestivum. Examples of fungi include but are not limited to Penicillum chrysogenum, Stachybotrys chartarum, Aspergillus fumigatus, Podospora anserina and Trichoderma reesei. Examples of sources of yeast include but are not limited to Saccharomyces cerevisiae, Candida tropicalis, Candida lypolitica, Candida utilis, Kluyveromyces lactis, Schizosaccharomyces pombe, Yarrowia lipolytica, Candida spp., Pichia spp. and Hansenula spp.

The prokaryote source could be bacteria or archaea. In some embodiments, the bacteria is E. coli, B. subtilis, Streptococcus gordonii, flavobacteria or green sulfur bacteria. In other embodiments, the archaea is Sulfolobus, Thermococcus, Methanobacterium, Halococcus, Halobacterium or Methanococcus jannaschii.

The ATP generating polypeptide can be a ATP sulfurylase, hydrolase or an ATP synthase. In a preferred embodiment, the ATP generating polypeptide is ATP sulfurylase. In one embodiment, the ATP sulfurylase is a thermostable sulfurylase cloned from Bacillus stearothermophilus (Bst) and comprising the nucleotide sequence of SEQ ID NO:1. This putative gene was cloned using genomic DNA acquired from ATCC (Cat. No. 12980D). The gene is shown to code for a functional ATP sulfurylase that can be expressed as a fusion protein with an affinity tag. The disclosed Bst sulfurylase nucleic acid (SEQ ID NO:1) includes the 1247 nucleotide sequence. An open reading frame (ORF) for the mature protein was identified beginning with an ATG codon at nucleotides 1-3 and ending with a TAA codon at nucleotides 1159-1161. The start and stop codons of the open reading frame are highlighted in bold type. The putative untranslated regions are underlined and found upstream of the initiation codon and downstream from the termination codon. Bst Thermostable Sulfurylase Nucleotide Sequence (SEQ ID NO: 1) GTTATGAAC ATGAGTTTGAGCATTCCGCATGGCGGCACATTGATCAACCGTTGGAATCGG 60 GATTACCCAATGGATGAAGCAACGAAAACGATGGAGGTGTCCAAAGCCGAAGTAAGCGAC 120 CTTGAGCTGATCGGCACAGGCGCCTACAGCCCGCTCACCGGGTTTTTAAGGAAAGCCGAT 180 TACGATGCGGTCGTAGAAACGATGCGCCTCGCTGATGGCACTGTCTGGAGCATTCCGATC 240 ACGCTGGCGGTGACGGAAGAAAAAGCGAGTGAACTCACTGTCGGCGACAAAGCGAAACTC 300 GTTTATGGCGGCGACGTCTAGGGCGTCATTGAAATCGCCGATATTTACCGCCCGGATAAA 360 ACGAAAGAAGCCAAGCTCGTCTATAAAACCGATGAACTCGCTCACCCGGGCGTGGGCAAG 420 CTGTTTGAAAAACCAGATGTGTAGGTCGGCGGAGCGGTTAGGCTCGTCAAACGGAGCGAC 480 AAAGGCCAGTTTGCTCCGTTTTATTTCGATCCGGCCGAAACGCGGAAACGATTTGCCGAA 540 CTCGGCTGGAATACCGTCGTGGGCTTCCAAACACGCAACCCGGTTCACCGGGCCCATGAA 600 TACATTCAAAAATGCGCGCTTGAAATCGTGGACGGCTTGTTTTTAAACCCGCTCGTCGGC 660 GAAACGAAAGCGGACGATATTCCGGCCGACATCCGGATGGAAAGCTATCAAGTGCTGCTG 720 GAAAACTATTATCCGAAAGACCGCGTTTTCTTGGGCGTCTTCCAAGCTGCGATGCGGTAT 780 GCCGGTCCGCGCGAAGCGATTTTCCATGCCATGGTGCGGAAAAACTTCGGCTGCACGCAC 840 TTCATCGTCGGCCGGGACCATGCGGGCGTCGGCAACTATTACGGCACGTATGATGCGCAA 900 AAAATCTTCTCGAACTTTACAGCCGAAGAGCTTGGCATTACACCGCTCTTTTTCGAACAC 960 AGCTTTTATTGCAGGAAATGCGAAGGGATGGCATCGAGGAAAACATGCCCGCACGACGCA 1020 CAATATCACGTTGTCCTTTCTGGCACGAAAGTCCGTGAAATGTTGCGTAACGGCCAAGTG 1080 CCGCCGAGCACATTCAGCCGTCCGGAAGTGGCCGGCGTTTTGATCAAAGGGCTGCAAGAA 1140 CGCGAAACGGTCACCCCGTCGACACGCTAA AGGAGGAGCGAGATGAGCACGAATATCGTT 1200 TGGCATCATACATCGGTGACAAAAGAAGATCGCCGCCAACGCAACGG 1247

The Bst sulfurylase polypeptide (SEQ ID NO:2) is 386 amino acid residues in length and is presented using the three letter amino acid code. Bst Sulfurylase Amino Acid Sequence (SEQ ID NO: 2) Met Ser Leu Ser Ile Pro His Gly Gly Thr Leu Ile   1               5                  10 Asn Arg Trp Asn Pro Asp Tyr Pro Ile Asp Glu Ala          15                  20 Thr Lys Thr Ile Glu Leu Ser Lys Ala Glu Leu Ser     25                  30                  35 Asp Leu Glu Leu Ile Gly Thr Gly Ala Tyr Ser Pro                  40                  45 Leu Thr Gly Phe Leu Thr Lys Ala Asp Tyr Asp Ala          50                  55 Val Val Glu Thr Met Arg Leu Ala Asp Gly Thr Val  60                  65                  70 Trp Ser Ile Pro Ile Thr Leu Ala Val Thr Glu Glu              75                  80 Lys Ala Ser Glu Leu Thr Val Gly Asp Lys Ala Lys     85                  90                  95 Leu Val Tyr Gly Gly Asp Val Tyr Gly Val Ile Glu                 100                 105 Ile Ala Asp Ile Tyr Arg Pro Asp Lys Thr Lys Glu         110                 115 Ala Lys Leu Val Tyr Lys Thr Asp Glu Leu Ala His 120                 125                 130 Pro Gly Val Arg Lys Leu Phe Glu Lys Pro Asp Val             135                 140 Tyr Val Gly Gly Ala Val Thr Leu Val Lys Arg Thr     145                 150                 155 Asp Lys Gly Gln Phe Ala Pro Phe Tyr Phe Asp Pro                 160                 165 Ala Glu Thr Arg Lys Arg Phe Ala Glu Leu Gly Trp         170                 175 Asn Thr Val Val Gly Phe Gln Thr Arg Asn Pro Val 180                 185                 190 His Arg Ala His Glu Tyr Ile Gln Lys Cys Ala Leu             195                 200 Glu Ile Val Asp Gly Leu Phe Leu Asn Pro Leu Val     205                 210                 215 Gly Glu Thr Lys Ala Asp Asp Ile Pro Ala Asp Ile                 220                 225 Arg Met Glu Ser Tyr Gln Val Leu Leu Glu Asn Tyr         230                 235 Tyr Pro Lys Asp Arg Val Phe Leu Gly Val Phe Gln 240                 245                 250 Ala Ala Met Arg Tyr Ala Gly Pro Arg Glu Ala Ile             255                 260 Phe His Ala Met Val Arg Lys Asn Phe Gly Cys Thr     265                 270                 275 His Phe Ile Val Gly Arg Asp His Ala Gly Val Gly                 280                 285 Asn Tyr Tyr Gly Thr Tyr Asp Ala Gln Lys Ile Phe         290                 295 Ser Asn Phe Thr Ala Glu Glu Leu Gly Ile Thr Pro 300                 305                 310 Leu Phe Phe Glu His Ser Phe Tyr Cys Thr Lys Cys             315                 320 Glu Gly Met Ala Ser Thr Lys Thr Cys Pro His Asp     325                 330                 335 Ala Gln Tyr His Val Val Leu Ser Gly Thr Lys Val                 340                 345 Arg Glu Met Leu Arg Asn Gly Gln Val Pro Pro Ser         350                 355 Thr Phe Ser Arg Pro Glu Val Ala Ala Val Leu Ile 360                 365                 370 Lys Gly Leu Gln Glu Arg Glu Thr Val Thr Pro Ser             375                 380 Thr Arg     385

In one embodiment, the thermostable sulfurylase is active at temperatures above ambient to at least 50° C. This property is beneficial so that the sulfurylase will not be denatured at higher temperatures commonly utilized in polymerase chain reaction (PCR) reactions or sequencing reactions. In one embodiment, the ATP sulfurylase is from a thermophile. The thermostable sulfurylase can come from thermophilic bacteria, including but not limited to, Bacillus stearothermophilus, Thermus thermophilus, Bacillus caldolyticus, Bacillus subtilis, Bacillus thermoleovorans, Pyrococcus furiosus, Sulfolobus acidocaldarius, Rhodothermus obamensis, Aquifex aeolicus, Archaeoglobus fulgidus, Aeropyrum pernix, Pyrobaculum aerophilum, Pyrococcus abyssi, Penicillium chrysogenum, Sulfolobus solfataricus and Thermomonospora fusca.

The homology of twelve ATP sulfurylases can be shown graphically in the ClustalW analysis in Table 1. The alignment is of ATP sulfurylases from the following species: Bacillus stearothermophilus (Bst), University of Oklahoma—Strain 10 (Univ of OK), Aquifex aeolicus (Aae), Pyrococcus furiosus (Pfu), Sulfolobus solfataricus (Sso), Pyrobaculum aerophilum (Pae), Archaeoglobus fulgidus (Afu), Penicillium chrysogenum (Pch), Aeropyrum pernix (Ape), Saccharomyces cerevisiae (Sce), and Thermomonospora fusca (Tfu).

A thermostable sulfurylase polypeptide is encoded by the open reading frame (“ORF”) of a thermostable sulfurylase nucleic acid. An ORF corresponds to a nucleotide sequence that could potentially be translated into a polypeptide. A stretch of nucleic acids comprising an ORF is uninterrupted by a stop codon. An ORF that represents the coding sequence for a full protein begins with an ATG “start” codon and terminates with one of the three “stop” codons, namely, TAA, TAG, or TGA. For the purposes of this invention, an ORF may be any part of a coding sequence, with or without a start codon, a stop codon, or both. For an ORF to be considered as a good candidate for coding for a bona fide cellular protein, a minimum size requirement is often set, e.g., a stretch of DNA that would encode a protein of 50 amino acids or more.

The invention further encompasses nucleic acid molecules that differ from the nucleotide sequences shown in SEQ ID NO:1 due to degeneracy of the genetic code and thus encode the same thermostable sulfurylase proteins as that encoded by the nucleotide sequences shown in SEQ ID NO:1. In another embodiment, an isolated nucleic acid molecule of the invention has a nucleotide sequence encoding a protein having an amino acid sequence shown in SEQ ID NO:2. In addition to the thermostable sulfurylase nucleotide sequence shown in SEQ ID NO:1 it will be appreciated by those skilled in the art that DNA sequence polymorphisms that lead to changes in the amino acid sequences of the thermostable sulfurylase polypeptides may exist within a population (e.g., the bacterial population). Such genetic polymorphism in the thermostable sulfurylase genes may exist among individuals within a population due to natural allelic variation. As used herein, the terms “gene” and “recombinant gene” refer to nucleic acid molecules comprising an open reading frame encoding a thermostable sulfurylase protein. Such natural allelic variations can typically result in 1-5% variance in the nucleotide sequence of the thermostable sulfurylase genes. Any and all such nucleotide variations and resulting amino acid polymorphisms in the thermostable sulfurylase polypeptides, which are the result of natural allelic variation and that do not alter the functional activity of the thermostable sulfurylase polypeptides, are intended to be within the scope of the invention.

Moreover, nucleic acid molecules encoding thermostable sulfurylase proteins from other species, and thus that have a nucleotide sequence that differs from the sequence SEQ ID NO:1 are intended to be within the scope of the invention. Nucleic acid molecules corresponding to natural allelic variants and homologues of the thermostable sulfurylase cDNAs of the invention can be isolated based on their homology to the thermostable sulfurylase nucleic acids disclosed herein using the human cDNAs, or a portion thereof, as a hybridization probe according to standard hybridization techniques under stringent hybridization conditions. The invention further includes the nucleic acid sequence of SEQ ID NO:1 and mature and variant forms thereof, wherein a first nucleotide sequence comprising a coding sequence differing by one or more nucleotide sequences from a coding sequence encoding said amino acid sequence, provided that no more than 11% of the nucleotides in the coding sequence differ from the coding sequence.

Another aspect of the invention pertains to nucleic acid molecules encoding a thermostable sulfurylase protein that contains changes in amino acid residues that are not essential for activity. Such thermostable sulfurylase proteins differ in amino acid sequence from SEQ ID NO:2 yet retain biological activity. In separate embodiments, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a protein, wherein the protein comprises an amino acid sequence at least about 96%, 97%, 98% or 99% homologous to the amino acid sequence of SEQ ID NO:2. An isolated nucleic acid molecule encoding a thermostable sulfurylase protein homologous to the protein of SEQ ID NO: 2 can be created by introducing one or more nucleotide substitutions, additions or deletions into the nucleotide sequence of SEQ ID NO:1 such that one or more amino acid substitutions, additions or deletions are introduced into the encoded protein.

Mutations can be introduced into SEQ ID NO:2 by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at one or more predicted, non-essential amino acid residues. A “conservative amino acid substitution” is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined within the art. These families include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g. threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a predicted non-essential amino acid residue in the thermostable sulfurylase protein is replaced with another amino acid residue from the same side chain family. Alternatively, in another embodiment, mutations can be introduced randomly along all or part of a thermostable sulfurylase coding sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for thermostable sulfurylase biological activity to identify mutants that retain activity. Following mutagenesis of SEQ ID NO:1, the encoded protein can be expressed by any recombinant technology known in the art and the activity of the protein can be determined.

The relatedness of amino acid families may also be determined based on side chain interactions. Substituted amino acids may be fully conserved “strong” residues or fully conserved “weak” residues. The “strong” group of conserved amino acid residues may be any one of the following groups: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, FYW, wherein the single letter amino acid codes are grouped by those amino acids that may be substituted for each other. Likewise, the “weak” group of conserved residues may be any one of the following: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, VLIM, HFY, wherein the letters within each group represent the single letter amino acid code.

The thermostable sulfurylase nucleic acid of the invention includes the nucleic acid whose sequence is provided herein, or fragments thereof. The invention also includes mutant or variant nucleic acids any of whose bases may be changed from the corresponding base shown herein while still encoding a protein that maintains its sulfurylase-like activities and physiological functions, or a fragment of such a nucleic acid. The invention further includes nucleic acids whose sequences are complementary to those just described, including nucleic acid fragments that are complementary to any of the nucleic acids just described. The invention additionally includes nucleic acids or nucleic acid fragments, or complements thereto, whose structures include chemical modifications. Such modifications include, by way of nonlimiting example, modified bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These modifications are carried out at least in part to enhance the chemical stability of the modified nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in therapeutic applications in a subject.

A thermostable sulfurylase nucleic acid can encode a mature thermostable sulfurylase polypeptide. As used herein, a “mature” form of a polypeptide or protein disclosed in the present invention is the product of a naturally occurring polypeptide or precursor form or proprotein. The naturally occurring polypeptide, precursor or proprotein includes, by way of nonlimiting example, the full-length gene product, encoded by the corresponding gene. Alternatively, it may be defined as the polypeptide, precursor or proprotein encoded by an ORF described herein. The product “mature” form arises, again by way of nonlimiting example, as a result of one or more naturally occurring processing steps as they may take place within the cell, or host cell, in which the gene product arises. Examples of such processing steps leading to a “mature” form of a polypeptide or protein include the cleavage of the N-terminal methionine residue encoded by the initiation codon of an ORF, or the proteolytic cleavage of a signal peptide or leader sequence. Thus a mature form arising from a precursor polypeptide or protein that has residues 1 to N, where residue 1 is the N-terminal methionine, would have residues 2 through N remaining after removal of the N-terminal methionine. Alternatively, a mature form arising from a precursor polypeptide or protein having residues 1 to N, in which an N-terminal signal sequence from residue 1 to residue M is cleaved, would have the residues from residue M+1 to residue N remaining. Further as used herein, a “mature” form of a polypeptide or protein may arise from a step of post-translational modification other than a proteolytic cleavage event. Such additional processes include, by way of non-limiting example, glycosylation, myristoylation or phosphorylation. In general, a mature polypeptide or protein may result from the operation of only one of these processes, or a combination of any of them.

The term “isolated” nucleic acid molecule, as utilized herein, is one, which is separated from other nucleic acid molecules which are present in the natural source of the nucleic acid. Preferably, an “isolated” nucleic acid is free of sequences which naturally flank the nucleic acid (i.e., sequences located at the 5′- and 3′-termini of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated thermostable sulfurylase nucleic acid molecules can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank the nucleic acid molecule in genomic DNA of the cell/tissue from which the nucleic acid is derived (e.g., brain, heart, liver, spleen, etc.). Moreover, an “isolated” nucleic acid molecule, such as a cDNA molecule, can be substantially free of other cellular material or culture medium when produced by recombinant techniques, or of chemical precursors or other chemicals when chemically synthesized.

A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the nucleotide sequence of SEQ ID NO:1 or a complement of this aforementioned nucleotide sequence, can be isolated using standard molecular biology techniques and the sequence information provided herein. Using all or a portion of the nucleic acid sequence of SEQ ID NO:1 as a hybridization probe, thermostable sulfurylase molecules can be isolated using standard hybridization and cloning techniques (e.g., as described in Sambrook, et al., (eds.), MOLECULAR CLONING: A LABORATORY MANUAL 2^(nd) Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989; and Ausubel, et al., (eds.), CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, N.Y., 1993.)

A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, genomic DNA, as a template and appropriate oligonucleotide primers according to standard PCR amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector and characterized by DNA sequence analysis. Furthermore, oligonucleotides corresponding to thermostable sulfurylase nucleotide sequences can be prepared by standard synthetic techniques, e.g., using an automated DNA synthesizer.

As used herein, the term “complementary” refers to Watson-Crick or Hoogsteen base pairing between nucleotides units of a nucleic acid molecule, and the term “binding” means the physical or chemical interaction between two polypeptides or compounds or associated polypeptides or compounds or combinations thereof. Binding includes ionic, non-ionic, van der Waals, hydrophobic interactions, and the like. A physical interaction can be either direct or indirect. Indirect interactions may be through or due to the effects of another polypeptide or compound. Direct binding refers to interactions that do not take place through, or due to, the effect of another polypeptide or compound, but instead are without other substantial chemical intermediates.

Fragments provided herein are defined as sequences of at least 6 (contiguous) nucleic acids or at least 4 (contiguous) amino acids, a length sufficient to allow for specific hybridization in the case of nucleic acids or for specific recognition of an epitope in the case of amino acids, respectively, and are at most some portion less than a full length sequence. Fragments may be derived from any contiguous portion of a nucleic acid or amino acid sequence of choice. Derivatives are nucleic acid sequences or amino acid sequences formed from the native compounds either directly or by modification or partial substitution. Analogs are nucleic acid sequences or amino acid sequences that have a structure similar to, but not identical to, the native compound but differs from it in respect to certain components or side chains. Analogs may be synthetic or from a different evolutionary origin and may have a similar or opposite metabolic activity compared to wild type. Homologs are nucleic acid sequences or amino acid sequences of a particular gene that are derived from different species.

Derivatives and analogs may be full length or other than full length, if the derivative or analog contains a modified nucleic acid or amino acid, as described below. Derivatives or analogs of the nucleic acids or proteins of the invention include, but are not limited to, molecules comprising regions that are substantially homologous to the nucleic acids or proteins of the invention, in various embodiments, by at least about 89% identity over a nucleic acid or amino acid sequence of identical size or when compared to an aligned sequence in which the alignment is done by a computer homology program known in the art, or whose encoding nucleic acid is capable of hybridizing to the complement of a sequence encoding the aforementioned proteins under stringent, moderately stringent, or low stringent conditions. See e.g. Ausubel, et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, N.Y., 1993, and below.

A “homologous nucleic acid sequence” or “homologous amino acid sequence,” or variations thereof, refer to sequences characterized by a homology at the nucleotide level or amino acid level as discussed above. Homologous nucleotide sequences encode those sequences coding for isoforms of thermostable sulfurylase polypeptides. Isoforms can be expressed in different tissues of the same organism as a result of, for example, alternative splicing of RNA. Alternatively, isoforms can be encoded by different genes. In the invention, homologous nucleotide sequences include nucleotide sequences encoding for a thermostable sulfurylase polypeptide of species other than humans, including, but not limited to: vertebrates, and thus can include, e.g., frog, mouse, rat, rabbit, dog, cat cow, horse, and other organisms. Homologous nucleotide sequences also include, but are not limited to, naturally occurring allelic variations and mutations of the nucleotide sequences set forth herein. Homologous nucleic acid sequences include those nucleic acid sequences that encode conservative amino acid substitutions in SEQ ID NO:1, as well as a polypeptide possessing thermostable sulfurylase biological activity. Various biological activities of the thermostable sulfurylase proteins are described below.

The thermostable sulfurylase proteins of the invention include the sulfurylase protein whose sequence is provided herein. The invention also includes mutant or variant proteins any of whose residues may be changed from the corresponding residue shown herein while still encoding a protein that maintains its sulfurylase-like activities and physiological functions, or a functional fragment thereof. The invention further encompasses antibodies and antibody fragments, such as F_(ab) or (F_(ab))2, that bind immunospecifically to any of the proteins of the invention. This invention also includes a variant or a mature form of the amino acid sequence of SEQ ID NO:2, wherein one or more amino acid residues in the variant differs in no more than 4% of the amino acic residues from the amino acid sequence of the mature form.

Several assays have been developed for detection of the forward ATP sulfurylase reaction. The colorimetric molybdolysis assay is based on phosphate detection (see e.g., Wilson and Bandurski, 1958. J. Biol. Chem. 233: 975-981), whereas the continuous spectrophotometric molybdolysis assay is based upon the detection of NADH oxidation (see e.g., Seubert, et al., 1983. Arch. Biochem. Biophys. 225: 679-691; Seubert, et al., 1985. Arch. Biochem. Biophys. 240: 509-523). The later assay requires the presence of several detection enzymes.

Suitable enzymes for converting ATP into light include luciferases, e.g., insect luciferases. Luciferases produce light as an end-product of catalysis. The best known light-emitting enzyme is that of the firefly, Photinus pyralis (Coleoptera). The corresponding gene has been cloned and expressed in bacteria (see e.g., de Wet, et al., 1985. Proc. Natl. Acad. Sci. USA 80: 7870-7873) and plants (see e.g., Ow, et al., 1986. Science 234: 856-859), as well as in insect (see e.g., Jha, et al., 1990. FEBS Lett. 274: 24-26) and mammalian cells (see e.g., de Wet, et al., 1987. Mol. Cell. Biol. 7: 725-7373; Keller, et al., 1987. Proc. Natl. Acad. Sci. USA 82: 3264-3268). In addition, a number of luciferase genes from the Jamaican click beetle, Pyroplorus plagiophihalamus (Coleoptera), have recently been cloned and partially characterized (see e.g., Wood, et al., 1989. J. Biolumin. Chemilumin. 4: 289-301; Wood, et al., 1989. Science 244: 700-702). Distinct luciferases can sometimes produce light of different wavelengths, which may enable simultaneous monitoring of light emissions at different wavelengths. Accordingly, these aforementioned characteristics are unique, and add new dimensions with respect to the utilization of current reporter systems.

Firefly luciferase catalyzes bioluminescence in the presence of luciferin, adenosine 5′-triphosphate (ATP), magnesium ions, and oxygen, resulting in a quantum yield of 0.88 (see e.g., McElroy and Selinger, 1960. Arch. Biochem. Biophys. 88: 136-145). The firefly luciferase bioluminescent reaction can be utilized as an assay for the detection of ATP with a detection limit of approximately 1×10⁻¹³ M (see e.g., Leach, 1981. J. Appl. Biochem. 3: 473-517). In addition, the overall degree of sensitivity and convenience of the luciferase-mediated detection systems have created considerable interest in the development of firefly luciferase-based biosensors (see e.g., Green and Kricka, 1984. Talanta 31: 173-176; Blum, et al., 1989. J. Biolumin. Chemilumin. 4: 543-550).

The development of new reagents have made it possible to obtain stable light emission proportional to the concentrations of ATP (see e.g., Lundin, 1982. Applications of firefly luciferase In; Luminescent Assays (Raven Press, New York). With such stable light emission reagents, it is possible to make endpoint assays and to calibrate each individual assay by addition of a known amount of ATP. In addition, a stable light-emitting system also allows continuous monitoring of ATP-converting systems.

In a preferred embodiment, the ATP generating-ATP converting fusion protein is attached to an affinity tag. The term “affinity tag” is used herein to denote a peptide segment that can be attached to a polypeptide to provide for purification or detection of the polypeptide or provide sites for attachment of the polypeptide to a substrate. In principal, any peptide or protein for which an antibody or other specific binding agent is available can be used as an affinity tag. Affinity tags include a poly-histidine tract or a biotin carboxyl carrier protein (BCCP) domain, protein A (Nilsson et al., EMBO J. 4:1075, 1985; Nilsson et al., Methods Enzymol. 198:3, 1991), glutathione S transferase (Smith and Johnson, Gene 67:31, 1988), substance P, Flag.™. peptide (Hopp et al., Biotechnology 6:1204-1210, 1988; available from Eastman Kodak Co., New Haven, Conn.), streptavidin binding peptide, or other antigenic epitope or binding domain. See, in general Ford et al., Protein Expression and Purification 2: 95-107, 1991. DNAs encoding affinity tags are available from commercial suppliers (e.g., Pharmacia Biotech, Piscataway, N.J.).

As used herein, the term “poly-histidine tag,” when used in reference to a fusion protein refers to the presence of two to ten histidine residues at either the amino- or carboxy-terminus of a protein of interest. A poly-histidine tract of six to ten residues is preferred. The poly-histidine tract is also defined functionally as being a number of consecutive histidine residues added to the protein of interest which allows the affinity purification of the resulting fusion protein on a nickel-chelate or IDA column.

In some embodiments, the fusion protein has an orientation such that the sulfurylase polypeptide is N-terminal to the luciferase polypeptide. In other embodiments, the luciferase polypeptide is N-terminal to the sulfurylase polypeptide. As used herein, the term sulfurylase-luciferase fusion protein refers to either of these orientations. The terms “amino-terminal” (N-terminal) and “carboxyl-terminal” (C-terminal) are used herein to denote positions within polypeptides and proteins. Where the context allows, these terms are used with reference to a particular sequence or portion of a polypeptide or protein to denote proximity or relative position. For example, a certain sequence positioned carboxyl-terminal to a reference sequence within a protein is located proximal to the carboxyl terminus of the reference sequence, but is not necessarily at the carboxyl terminus of the complete protein.

The fusion protein of this invention can be produced by standard recombinant DNA techniques. For example, DNA fragments coding for the different polypeptide sequences are ligated together in-frame in accordance with conventional techniques, e.g. by employing blunt-ended or “sticky”-ended termini for ligation, restriction enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another embodiment, the fusion gene can be synthesized by conventional techniques including automated DNA synthesizers. Alternatively, PCR amplification of gene fragments can be carried out using anchor primers that give rise to complementary overhangs between two consecutive gene fragments that can subsequently be annealed and reamplified to generate a chimeric gene sequence (see, for example, Ausubel et al. (eds.) CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, 1992). The two polypeptides of the fusion protein can also be joined by a linker, such as a unique restriction site, which is engineered with specific primers during the cloning procedure. In one embodiment, the sulfurylase and luciferase polypeptides are joined by a linker, for example an ala-ala-ala linker which is encoded by a Notl restriction site.

In one embodiment, the invention includes a recombinant polynucleotide that comprises a coding sequence for a fusion protein having an ATP generating polypeptide sequence and an ATP converting polypeptide sequence. In a preferred embodiment, the recombinant polynucleotide encodes a sulfurylase-luciferase fusion protein. The term “recombinant DNA molecule” or “recombinant polynucleotide” as used herein refers to a DNA molecule which is comprised of segments of DNA joined together by means of molecular biological techniques. The term “recombinant protein” or “recombinant polypeptide” as used herein refers to a protein molecule which is expressed from a recombinant DNA molecule.

In one aspect, this invention discloses a sulfurylase-luciferase fusion protein with an N-terminal hexahistidine tag and a BCCP tag. The nucleic acid sequence of the disclosed N-terminal hexahistidine-BCCP luciferase-sulfurylase gene (His6-BCCP L-S) gene is shown below: His6-BCCP L-S Nucleotide Sequence (SEQ ID NO: 3): ATGCGGGGTTCTCATCATCATCATCATCATGGTATGGCTAGCATGGAAGCGCCAGCAGCA 60 GCGGAAATCAGTGGTCACATCGTACGTTCCCCGATGGTTGGTAGTTTCTACCGCACCCCA 120 AGCCCGGACGCAAAAGCGTTCATCGAAGTGGGTCAGAAAGTCAACGTGGGCGATACCCTG 180 TGCATCGTTGAAGCCATGAAAATGATGAACCAGATCGAAGCGGACAAATCCGGTACCGTG 240 AAAGCAATTCTGGTCGAAAGTGGACAACCGGTAGAATTTGACGAGCCGCTGGTCGTCATC 300 GAGGGATCCGAGCTCGAGATCCAAATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCG 360 CCATTCTATCCTCTAGAGGATGGAACCGCTGGAGAGCAACTGCATAAGGCTATGAAGAGA 420 TACGCCCTGGTTCCTGGAACAATTGCTTTTACAGATGCACATATCGAGGTGAACATCACG 480 TACGCGGAATACTTCGAAATGTCCGTTCGGTTGGCAGAAGCTATGAAACGATATGGGCTG 540 AATACAAATCACAGAATCGTCGTATGCAGTGAAAACTCTCTTCAATTCTTTATGCCGGTG 600 TTGGGCGCGTTATTTATCGGAGTTGCAGTTGCGCCCGCGAACGACATTTATAATGAACGT 660 GAATTGCTCAACAGTATGAACATTTCGCAGCCTACCGTAGTGTTTGTTTCCAAAAAGGGG 720 TTGCAAAAAATTTTGAACGTGCAAAAAAAATTACCAATAATCCAGAAAATTATTATCATG 780 GATTCTAAAACGGATTACCAGGGATTTCAGTCGATGTACACGTTCGTCACATCTGATCTA 840 CCTCCCGGTTTTAATGAATACGATTTTGTACCAGAGTCCTTTGATCGTGACAAAACAATT 900 GCACTGATAATGAATTCCTCTGGATCTACTGGGTTACCTAAGGGTGTGGCCCTTCCGCAT 960 AGAACTGCCTGCGTCAGATTCTCGCATGCCAGAGATCCTATTTTTGGCAATCAAATCATT 1020 CCGGATACTGCGATTTTAAGTGTTGTTCCATTCCATCACGGTTTTGGAATGTTTACTACA 1080 CTCGGATATTTGATATGTGGATTTCGAGTCGTCTTAATGTATAGATTTGAAGAAGAGCTG 1140 TTTTTACGATCCCTTCAGGATTACAAAATTCAAAGTGCGTTGCTAGTACCAACCCTATTT 1200 TCATTCTTCGCCAAAAGCACTCTGATTGACAAATACGATTTATCTAATTTACACGAAATT 1260 GCTTCTGGGGGCGCACCTCTTTCGAAAGAAGTCGGGGAAGCGGTTGCAAAACGCTTCCAT 1320 CTTCCAGGGATACGACAAGGATATGGGCTCACTGAGACTACATCAGCTATTCTGATTACA 1380 CCCGAGGGGGATGATAAACCGGGCGCGGTCGGTAAAGTTGTTCCATTTTTTGAAGCGAAG 1440 GTTGTGGATCTGGATACCGGGAAAACGCTGGGCGTTAATCAGAGAGGCGAATTATGTGTC 1500 AGAGGACCTATGATTATGTCCGGTTATGTAAACAATCCGGAAGCGACCAACGCCTTGATT 1560 GACAAGGATGGATGGCTACATTCTGGAGACATAGCTTACTGGGACGAAGACGAACACTTC 1620 TTCATAGTTGACCGCTTGAAGTCTTTAATTAAATACAAAGGATATCAGGTGGCCCCCGCT 1680 GAATTGGAATCGATATTGTTACAACACCCCAACATCTTCGACGCGGGCGTGGCAGGTCTT 1740 CCCGACGATGACGCCGGTGAACTTCCCGGCGCCGTTGTTGTTTTGGAGCACGGAAAGACG 1800 ATGACGGAAAAAGAGATCGTGGATTACGTCGCCAGTCAAGTAACAACCGCGAAAAAGTTG 1860 CGCGGAGGAGTTGTGTTTGTGGACGAAGTACCGAAAGGTCTTACCGGAAAACTCGACGCA 1920 AGAAAAATCAGAGAGATCCTCATAAAGGCCAAGAAGGGCGGAAAGTCCAAATTGGCGGCC 1980 GCTATGCCTGCTCCTCACGGTGGTATTCTACAAGACTTGATTGCTAGAGATGCGTTAAAG 2040 AAGAATGAATTGTTATCTGAAGCGCAATCTTCGGACATTTTAGTATGGAACTTGACTCCT 2100 AGACAACTATGTGATATTGAATTGATTCTAAATGGTGGGTTTTCTCCTCTGACTGGGTTT 2160 TTGAACGAAAACGATTACTCCTCTGTTGTTACAGATTCGAGATTAGCAGACGGCACATTG 2220 TGGACCATCCCTATTACATTAGATGTTGATGAAGCATTTGCTAACCAAATTAAACCAGAC 2280 ACAAGAATTGCCCTTTTCCAAGATGATGAAATTCCTATTGCTATACTTACTGTCCAGGAT 2340 GTTTACAAGCCAAACAAAACTATCGAAGCCGAAAAAGTCTTCAGAGGTGACCCAGAACAT 2400 CCAGCCATTAGCTATTTATTTAACGTTGCCGGTGATTATTACGTCGGCGGTTCTTTAGAA 2460 GCGATTCAATTACCTCAACATTATGACTATCCAGGTTTGCGTAAGACACCTGCCCAACTA 2520 AGACTTGAATTCCAATCAAGACAATGGGACCGTGTCGTAGCTTTCCAAACTCGTAATCCA 2580 ATGCATAGAGCCCACAGGGAGTTGACTGTGAGAGCCGCCAGAGAAGCTAATGCTAAGGTG 2640 CTGATCCATCCAGTTGTTGGACTAACCAAACCAGGTGATATAGACCATCACACTCGTGTT 2700 CGTGTCTACCAGGAAATTATTAAGCGTTATCCTAATGGTATTGCTTTCTTATCCCTGTTG 2760 CCATTAGCAATGAGAATGAGTGGTGATAGAGAAGCCGTATGGCATGCTATTATTAGAAAG 2820 AATTATGGTGCCTCCCACTTCATTGTTGGTAGAGACCATGCGGGCCCAGGTAAGAACTCC 2880 AAGGGTGTTGATTTCTACGGTCCATACGATGCTCAAGAATTGGTCGAATCCTACAAGCAT 2940 GAACTGGACATTGAAGTTGTTGCATTCAGAATGGTCACTTATTTGCCAGACGAAGACCGT 3000 TATGCTCCAATTGATCAAATTGACACCACAAAGACGAGAACCTTGAACATTTCAGGTACA 3060 GAGTTGAGACGCCGTTTAAGAGTTGGTGGTGAGATTCCTGAATGGTTCTCATATCCTGAA 3120 GTGGTTAAAATCCTAAGAGAATCCAACCCACCAAGACCAAAACAAGGTTTTTCAATTGTT 3180 TTAGGTAATTCATTAACCGTTTCTCGTGAGCAATTATCCATTGCTTTGTTGTCAACATTC 3240 TTGCAATTCGGTGGTGGCAGGTATTACAAGATCTTTGAACACAATAATAAGACAGAGTTA 3300 CTATCTTTGATTCAAGATTTCATTGGTTCTGGTAGTGGACTAATTATTCCAAATCAATGG 3360 GAAGATGACAAGGACTCTGTTGTTGGCAAGCAAAACGTTTACTTATTAGATACCTCAAGC 3420 TCAGCCGATATTCAGCTAGAGTCAGCGGATGAACCTATTTCACATATTGTACAAAAAGTT 3480 GTCCTATTCTTGGAAGACAATGGCTTTTTTGTATTTTAA 3519

The amino acid sequence of the disclosed His6-BCCP L-S polypeptide is presented using the three letter amino acid code (SEQ ID NO:4). His6-BCCP L-S Amino Acid Sequence (SEQ ID NO: 4) Met Arg Gly Ser His His His His His His Gly Met   1               5                  10 Ala Ser Met Glu Ala Pro Ala Ala Ala Glu Ile Ser          15                  20 Gly His Ile Val Arg Ser Pro Met Val Gly Thr Phe  25                  30                  35 Tyr Arg Thr Pro Ser Pro Asp Ala Lys Ala Phe Ile          40                  45 Glu Val Gly Gln Lys Val Asn Val Gly Asp Thr Leu      50                  55                  60 Cys Ile Val Glu Ala Met Lys Met Met Asn Gln Ile                  65                  70 Glu Ala Asp Lys Ser Gly Thr Val Lys Ala Ile Leu          75                  80 Val Glu Ser Gly Gln Pro Val Glu Phe Asp Glu Pro  85                  90                  95 Leu Val Val Ile Glu Gly Ser Glu Leu Glu Ile Gln             100                 105 Met Glu Asp Ala Lys Asn Ile Lys Lys Gly Pro Ala     110                 115                 120 Pro Phe Tyr Pro Leu Glu Asp Gly Thr Ala Gly Glu                 125                 130 Gln Leu His Lys Ala Met Lys Arg Tyr Ala Leu Val         135                 140 Pro Gly Thr Ile Ala Phe Thr Asp Ala His Ile Glu 145                 150                 155 Val Asn Ile Thr Tyr Ala Glu Tyr Phe Glu Met Ser             160                 165 Val Arg Leu Ala Glu Ala Met Lys Arg Tyr Gly Leu     170                 175                 180 Asn Thr Asn His Arg Ile Val Val Cys Ser Glu Asn                 185                 190 Ser Leu Gln Phe Phe Met Pro Val Leu Gly Ala Leu         195                 200 Phe Ile Gly Val Ala Val Ala Pro Ala Asn Asp Ile 205                 210                 215 Tyr Asn Glu Arg Glu Leu Leu Asn Ser Met Asn Ile             220                 225 Ser Gln Pro Thr Val Val Phe Val Ser Lys Lys Gly     230                 235                 240 Leu Gln Lys Ile Leu Asn Val Gln Lys Lys Leu Pro                 245                 250 Ile Ile Gln Lys Ile Ile Ile Met Asp Ser Lys Thr         255                 260 Asp Tyr Gln Gly Phe Gln Ser Met Tyr Thr Phe Val 265                 270                 275 Thr Ser His Leu Pro Pro Gly Phe Asn Glu Tyr Asp             280                 285 Phe Val Pro Glu Ser Phe Asp Arg Asp Lys Thr Ile     290                 295                 300 Ala Leu Ile Met Asn Ser Ser Gly Ser Thr Gly Leu                 305                 310 Pro Lys Gly Val Ala Leu Pro His Arg Thr Ala Cys         315                 320 Val Arg Phe Ser His Ala Arg Asp Pro Ile Phe Gly 325                 330                 335 Asn Gln Ile Ile Pro Asp Thr Ala Ile Leu Ser Val             340                 345 Val Pro Phe His His Gly Phe Gly Met Phe Thr Thr     350                 355                 360 Leu Gly Tyr Leu Ile Cys Gly Phe Arg Val Val Leu                 365                 370 Met Tyr Arg Phe Glu Glu Glu Leu Phe Leu Arg Ser         375                 380 Leu Gln Asp Tyr Lys Ile Gln Ser Ala Leu Leu Val 385                 390                 395 Pro Thr Leu Phe Ser Phe Phe Ala Lys Ser Thr Leu             400                 405 Ile Asp Lys Tyr Asp Leu Ser Asn Leu His Glu Ile     410                 415                 420 Ala Ser Gly Gly Ala Pro Leu Ser Lys Glu Val Gly                 425                 430 Glu Ala Val Ala Lys Arg Phe His Leu Pro Gly Ile         435                 440 Arg Gln Gly Tyr Gly Leu Thr Glu Thr Thr Ser Ala 445                 450                 455 Ile Leu Ile Thr Pro Glu Gly Asp Asp Lys Pro Gly             460                 465 Ala Val Gly Lys Val Val Pro Phe Phe Glu Ala Lys     470                 475                 480 Val Val Asp Leu Asp Thr Gly Lys Thr Leu Gly Val                 485                 490 Asn Gln Arg Gly Glu Leu Cys Val Arg Gly Pro Met         495                 500 Ile Met Ser Gly Tyr Val Asn Asn Pro Glu Ala Thr 505                 510                 515 Asn Ala Leu Ile Asp Lys Asp Gly Trp Leu His Ser             520                 525 Gly Asp Ile Ala Tyr Trp Asp Glu Asp Glu His Phe     530                 535                 540 Phe Ile Val Asp Arg Leu Lys Ser Leu Ile Lys Tyr                 545                 550 Lys Gly Tyr Gln Val Ala Pro Ala Glu Leu Glu Ser         555                 560 Ile Leu Leu Gln His Pro Asn Ile Phe Asp Ala Gly 565                 570                 575 Val Ala Gly Leu Pro Asp Asp Asp Ala Gly Glu Leu             580                 585 Pro Ala Ala Val Val Val Leu Glu His Gly Lys Thr     590                 595                 600 Met Thr Glu Lys Glu Ile Val Asp Tyr Val Ala Ser                 605                 610 Gln Val Thr Thr Ala Lys Lys Leu Arg Gly Gly Val         615                 620 Val Phe Val Asp Glu Val Pro Lys Gly Leu Thr Gly 625                 630                 635 Lys Leu Asp Ala Arg Lys Ile Arg Glu Ile Leu Ile             640                 645 Lys Ala Lys Lys Gly Gly Lys Ser Lys Leu Ala Ala     650                 655                 660 Ala Met Pro Ala Pro His Gly Gly Ile Leu Gln Asp                 665                 670 Leu Ile Ala Arg Asp Ala Leu Lys Lys Asn Glu Leu             675                 680 Leu Ser Glu Ala Gln Ser Ser Asp Ile Leu Val Trp     685                 690                 695 Asn Leu Thr Pro Arg Gln Leu Cys Asp Ile Glu Leu                 700                 705 Ile Leu Asn Gly Gly Phe Ser Pro Leu Thr Gly Phe         710                 715 Leu Asn Glu Asn Asp Tyr Ser Ser Val Val Thr Asp 720                 725                 730 Ser Arg Leu Ala Asp Gly Thr Leu Trp Thr Ile Pro             735                 740 Ile Thr Leu Asp Val Asp Glu Ala Phe Ala Asn Gln     745                 750                 755 Ile Lys Pro Asp Thr Arg Ile Ala Leu Phe Gln Asp                 760                 765 Asp Glu Ile Pro Ile Ala Ile Leu Thr Val Gln Asp         770                 775 Val Tyr Lys Pro Asn Lys Thr Ile Glu Ala Glu Lys 780                 785                 790 Val Phe Arg Gly Asp Pro Glu His Pro Ala Ile Ser             795                 800 Tyr Leu Phe Asn Val Ala Gly Asp Tyr Tyr Val Gly     805                 810                 815 Gly Ser Leu Glu Ala Ile Gln Leu Pro Gln His Tyr                 820                 825 Asp Tyr Pro Gly Leu Arg Lys Thr Pro Ala Gln Leu         830                 835 Arg Leu Glu Phe Gln Ser Arg Gln Trp Asp Arg Val 840                 845                 850 Val Ala Phe Gln Thr Arg Asn Pro Met His Arg Ala             855                 860 His Arg Glu Leu Thr Val Arg Ala Ala Arg Glu Ala     865                 870                 875 Asn Ala Lys Val Leu Ile His Pro Val Val Gly Leu                 880                 885 Thr Lys Pro Gly Asp Ile Asp His His Thr Arg Val         890                 895 Arg Val Tyr Gln Glu Ile Ile Lys Arg Tyr Pro Asn 900                 905                 910 Gly Ile Ala Phe Leu Ser Leu Leu Pro Leu Ala Met             915                 920 Arg Met Ser Gly Asp Arg Glu Ala Val Trp His Ala     925                 930                 935 Ile Ile Arg Lys Asn Tyr Gly Ala Ser His Phe Ile                 940                 945 Val Gly Arg Asp His Ala Gly Pro Gly Lys Asn Ser         950                 955 Lys Gly Val Asp Phe Tyr Gly Pro Tyr Asp Ala Gln 960                 965                 970 Glu Leu Val Glu Ser Tyr Lys His Glu Leu Asp Ile             975                 980 Glu Val Val Pro Phe Arg Met Val Thr Tyr Leu Pro     985                 990                 995 Asp Glu Asp Arg Tyr Ala Pro Ile Asp Gln Ile Asp                 1000                1005 Thr Thr Lys Thr Arg Thr Leu Asn Ile Ser Gly Thr         1010                1015 Glu Leu Arg Arg Arg Leu Arg Val Gly Gly Glu Ile 1020                1025                1030 Pro Glu Trp Phe Ser Tyr Pro Glu Val Val Lys Ile             1035                1040 Leu Arg Glu Ser Asn Pro Pro Arg Pro Lys Gln Gly     1045                1050                1055 Phe Ser Ile Val Leu Gly Asn Ser Leu Thr Val Ser                 1060                1065 Arg Glu Gln Leu Ser Ile Ala Leu Leu Ser Thr Phe         1070                1075 Leu Gln Phe Gly Gly Gly Arg Tyr Tyr Lys Ile Phe 1080                1085                1090 Glu His Asn Asn Lys Thr Glu Leu Leu Ser Leu Ile             1095                1100 Gln Asp Phe Ile Gly Ser Gly Ser Gly Leu Ile Ile     1105                1110                1115 Pro Asn Gln Trp Glu Asp Asp Lys Asp Ser Val Val         1120                1125 Gly Lys Gln Asn Val Tyr Leu Leu Asp Thr Ser Ser         1130                1135 Ser Ala Asp Ile Gln Leu Glu Ser Ala Asp Glu Pro 1140                1145                1150 Ile Ser His Ile Val Gln Lys Val Val Leu Phe Leu             1155                1160 Glu Asp Asn Gly Phe Phe Val Phe     1165                1170

Accordingly, in one aspect, the invention provides for a fusion protein comprising a thermostable sulfurylase joined to at least one affinity tag. The nucleic acid sequence of the disclosed N-terminal hexahistidine-BCCP Bst ATP Sulfurylase (His6-BCCP Bst Sulfurylase) gene is shown below: His6-BCCP Bst Sulfurylase Nucleotide Sequence (SEQ ID NO: 5) ATGCGGGGTTCTCATGATCATCATCATCATGGTATGGCTAGCATGGAAGGGCCAGCAGCA 60 GCGGAAATCAGTGGTCACATCGTACGTTCCCCGATGGTTGGTACTTTCTACCGCACCCCA 120 AGCCCGGACGCAAAAGCGTTCATCGAAGTGGGTCAGAAAGTCAACGTGGGCGATACCCTG 180 TGCATCGTTGAAGCCATGAAAATGATGAACCAGATCGAAGCGGACAAATCCGGTACCGTG 240 AAAGCAATTCTGGTCGAAAGTGGACAACCGGTAGAATTTGACGAGCCGCTGGTCGTCATC 300 GAGGGATCCGAGCTCGAGATCTGCAGCATGAGCGTAAGCATCCCGCATGGCGGCACATTG 360 ATCAACCGTTGGAATCCGGATTACCCAATCGATGAAGCAACGAAAACGATCGAGCTGTCC 420 AAAGCCGAACTAAGCGACCTTGAGCTGATCGGCACAGGCGCCTACAGCCCGCTCACCGGG 480 TTTTTAACGAAAGCCGATTACGATGCGGTCGTAGAAACGATGCGCCTCGCTGATGGCACT 540 GTCTGGAGCATTCCGATCACGCTGGCGGTGACGGAAGAAAAAGCGAGTGAACTCACTGTC 600 GGCGACAAAGCGAAACTCGTTTATGGCGGCGACGTCTACGGCGTCATTGAAATCGCCGAT 660 ATTTACCGCCCGGATAAAACGAAAGAAGCCAAGCTCGTCTATAAAACCGATGAACTCGCT 720 CACCCGGGCGTGCGCAAGCTGTTTGAAAAACCAGATGTGTACGTCGGCGGAGCGGTTACG 780 CTCGTCAAACGGACCGACAAAGGCCAGTTTGCTCCGTTTTATTTCGATCCGGCCGAAACG 840 CGGAAACGATTTGCCGAACTCGGCTGGAATACCGTCGTCGGCTTCCAAACACGCAACCCG 900 GTTCACCGCGCCCATGAATACATTCAAAAATGCGCGCTTGAAATCGTGGACGGCTTGTTT 960 TTAAACCCGCTCGTCGGCGAAACGAAAGCGGACGATATTCCGGCCGACATCCGGATGGAA 1020 AGCTATCAAGTGCTGCTGGAAAACTATTATCCGAAAGACCGCGTTTTCTTGGGCGTCTTC 1080 CAAGCTGCGATGCGCTATGCCGGTCCGCGCGAAGCGATTTTCCATGCCATGGTGCGGAAA 1140 AACTTCGGCTGCACGCACTTCATCGTCGGCCGCGACCATGCGGGCGTCGGCAACTATTAC 1200 GGCACGTATGATGCGCAAAAAATCTTCTCGAACTTTACAGCCGAAGAGCTTGGCATTACA 1260 CCGCTCTTTTTCGAACACAGCTTTTATTGCACGAAATGCGAAGGCATGGCATCGACGAAA 1320 ACATGCCCGCACGACGCACAATATCACGTTGTCCTTTCTGGCACGAAAGTCCGTGAAATG 1380 TTGCGTAACGGCCAAGTGCCGCCGAGCACATTCAGCCGTCCGGAAGTGGCCGCCGTTTTG 1440 ATCAAAGGGCTGCAAGAACGCGAAACGGTCGCCCCGTCAGCGGGCTAA 1488

The amino acid sequence of the His6-BCCP Bst Sulfurylase polypeptide is presented using the three letter amino acid code in Table 6 (SEQ ID NO:6). 

1-221. (canceled)
 222. A method of determining the base sequence of a plurality of single stranded template nucleotides on an array, the method comprising: (a) providing a planar surface comprises at least 400,000 discrete cavities, wherein each cavity forms a reaction chamber containing single-stranded nucleic acid templates of a single species, wherein the reaction chambers have a center to center spacing of between 5 to 200 μm, wherein each reaction chamber contains a reaction mixture comprising a template-directed nucleotide polymerase and said one of said plurality of single-stranded template nucleotides hybridized to a complementary oligonucleotide primer strand at least one nucleotide residue shorter than the single-stranded template nucleotides to form at least one unpaired nucleotide residue in each template at the 3′-end of the primer strand; (b) adding an activated nucleotide 5′-triphosphate precursor of one known nitrogenous base to the reaction chambers under conditions which allow incorporation of the activated nucleoside 5′-triphosphate precursor onto the 3′-end of the primer strand, provided the nitrogenous base of the activated nucleoside 5′-triphosphate precursor is complementary to the nitrogenous base of the unpaired nucleotide residue of the templates; (c) detecting whether or not the nucleoside 5′-triphosphate precursor was incorporated into the primer strands in each reaction chamber by detecting a sequencing byproduct with an ATP generating polypeptide-ATP converting polypeptide fusion protein or an ATP generating protein and an ATP converting protein, thus indicating that the unpaired nucleotide residue of the template has a nitrogenous base composition that is complementary to that of the incorporated nucleoside 5′-triphosphate precursor in each reaction chamber; (d) sequentially repeating steps (b) and (c), wherein each sequential repetition adds and, detects the incorporation of one type of activated nucleoside 5′-triphosphate precursor of known nitrogenous base composition; and (e) determining the base sequence of the unpaired nucleotide residues of the template in each reaction chamber from the sequence of incorporation of said nucleoside precursors.
 223. The method of claim 222 wherein said sequencing byproduct is pyrophosphate.
 224. The method of claim 222 wherein the ATP generating polypeptide-ATP converting polypeptide fusion protein comprises an ATP generating polypeptide portion with an amino acid sequence which is at least 96% homologous to SEQ ID NO:2.
 225. The method of claim 222 wherein the ATP generating polypeptide-ATP converting polypeptide fusion protein comprises an ATP generating polypeptide portion with an amino acid sequence which is SEQ ID NO:6.
 226. The method of claim 222 wherein the ATP generating polypeptide-ATP converting polypeptide fusion protein comprises an amino acid sequence of SEQ ID NO:4.
 227. The method of claim 222 wherein the ATP generating protein comprises an amino acid sequence which is at least 96% homologous to SEQ ID NO:2.
 228. The method of claim 222 wherein the ATP generating protein comprises an amino acid sequence of SEQ ID NO:2 or SEQ ID NO:6.
 229. The method of claim 222 wherein said ATP generating polypeptide-ATP converting polypeptide fusion protein comprise an amino acid sequence encoded by a polynucleotide with an open reading frame of SEQ ID NO:3.
 230. The method of claim 222 wherein said ATP generating polypeptide comprise an amino acid sequence encoded by a polynucleotide with an open reading frame which is no more than 11% different from an open reading frame of SEQ ID NO:1.
 231. The method of claim 222 wherein said ATP generating polypeptide comprises an amino acid sequence encoded by an open reading frame of SEQ ID NO:1 or SEQ ID NO:5.
 232. The method of claim 222 wherein said ATP generating polypeptide-ATP converting polypeptide fusion protein or said ATP generating protein further comprises an affinity tag.
 233. The method of claim 222 wherein said ATP generating polypeptide-ATP converting polypeptide fusion protein, said ATP generating protein, or said ATP converting polypeptide is bound to a bead.
 234. A method of identifying a base at a target position in a sample nucleic acid sequence, comprising providing a sample nucleic acid and a primer which hybridizes to the sample nucleic acid immediately adjacent to the target position, subjecting the sample nucleic acid and primer to a polymerase reaction in the presence of a nucleotide whereby the nucleotide will only become incorporated if it is complementary to the base in the target position, and detecting said incorporation of the nucleotide by monitoring the release of inorganic pyrophosphate, whereby detection of incorporation of said nucleotide is indicative of identification of a base at a target position that is complementary to said nucleotide, and wherein the release of inorganic pyrophosphate is detected using a thermostable sulfurylase-luciferase fusion protein or a thermostable sulfurylase.
 235. The method of claim 234 wherein the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase comprises an amino acid of at least 96% homology to SEQ ID NO:2, SEQ ID NO:4 or SEQ ID NO:6.
 236. The method of claim 234 wherein the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase is encoded by an open reading frame of SEQ ID NO: 1, 3 or
 5. 237. The method of claim 234 wherein the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase further comprises an affinity tag.
 238. The method of claim 234 wherein said the thermostable sulfurylase-luciferase fusion protein or the thermostable sulfurylase is bound to a bead. 